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Abstract 

Background: Only a few transposable elements are known to exhibit site-specific insertion patterns, including the 
well-studied R-element retrotransposons that insert into specific sites within the multigene rDNA. The only known 
rDNA-specific DNA transposon, Pokey (superfamily: piggyBac) is found in the freshwater microcrustacean, Daphnia 
pulex. Here, we present a genome-wide analysis of Pokey based on the recently completed whole genome 
sequencing project for D. pulex. 

Results: Phylogenetic analysis of Pokey elements recovered from the genome sequence revealed the presence of 
four lineages corresponding to two divergent autonomous families and two related lineages of non-autonomous 
miniature inverted repeat transposable elements (MITEs). The MITEs are also found at the same 28S rRNA gene 
insertion site as the Pokey elements, and appear to have arisen as deletion derivatives of autonomous elements. 
Several copies of the full-length Pokey elements may be capable of producing an active transposase. Surprisingly, 
both families of Pokey possess a series of 200 bp repeats upstream of the transposase that is derived from the rDNA 
intergenic spacer (IGS). The IGS sequences within the Pokey elements appear to be evolving in concert with the 
rDNA units. Finally, analysis of the insertion sites of Pokey elements outside of rDNA showed a target preference for 
sites similar to the specific sequence that is targeted within rDNA. 

Conclusions: Based on the target site preference of Pokey elements and the concerted evolution of a segment of 
the element with the rDNA unit, we propose an evolutionary path by which the ancestors of Pokey elements have 
invaded the rDNA niche. We discuss how specificity for the rDNA unit may have evolved and how this specificity 
has played a role in the long-term survival of these elements in the subgenus Daphnia. 

Keywords: Transposons, Daphnia, Pokey, Ribosomal DNA, Insertion specificity 



Background 

Transposable elements (TEs) are found in nearly all or- 
ganisms and often comprise substantial portions of 
eukaryotic genomes [1]. Many TEs insert into locations 
throughout the genome, while others insert preferen- 
tially into specific sequences. A site preferred by non- 
long terminal repeat (non-LTR) retrotransposons is the 
locus encoding rRNA [2]. Pokey is the only example of a 
DNA transposon known to insert specifically in rDNA. 
Pokey inserts into the same 28S gene region that is 
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highly targeted by non-LTR elements [3]. Insertion of 
any of these elements is expected to disrupt the produc- 
tion of functional rRNA from the inserted units. 

rDNA is comprised of hundreds to thousands of units 
arrayed in tandem encoding one copy each of the core 
18S, 5.8S and 28S rRNAs. The many copies of each 
rRNA gene show high sequence identity, the product of 
recombinational processes termed concerted evolution 
(reviewed in [2]). The primary mechanism conferring 
high identity between copies is unequal crossing over, 
which also generates the large variation in rDNA copy 
number observed between members of the same species 
[4]. The combined processes of concerted evolution and 
selection against inserted units require that any element 
with a long-term presence in the rDNA unit regularly 
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generate new insertions to avoid being eliminated from 
the locus [4,5]. 

Pokey elements are members of the piggyBac super- 
family of DNA-mediated TEs that insert into TTAA tar- 
get sequences [3,6]. This element was first identified in 
the cladoceran crustacean Daphnia pulex, and is now 
known to be widespread throughout the subgenus 
Daphnia. Unlike most DNA TEs, Pokey elements have 
undergone stable vertical inheritance for millions of 
years [7]. To the best of our knowledge, the only other 
organisms in which open reading frames (ORFs) similar 
to those in Pokey have been found are the silkmoth 
Bombyx mori [8], the tunicate Ciona savignyi [9], and 
the rotifer Adineta vaga [10]. Pokey elements have been 
found at multiple TTAA insertion sites throughout the 
genome and thus, like other piggyBac elements, appear 
to require little additional conservation of target sites 
[3,11]. Nevertheless, Pokey have been repeatedly found 
at just one location in the 28S genes despite the pres- 
ence of over 30 TTAA motifs in the entire rDNA unit. 
While this finding might suggest that properties in 
addition to TTAA are preferred for Pokey insertion, the 
frequency of independent Pokey insertions in the rDNA 
locus is not known. Thus, it is unclear whether rDNA 
acts as a sink or source for Pokey elements, or whether 
there is free and on-going exchange between Pokey ele- 
ments in and outside the rRNA genes. 



In this study, we used the original sequencing reads 
available from the Daphnia genome sequencing project, 
available at the Trace Archives at GenBank, as well as 
the annotated scaffold sequences to study Pokey ele- 
ments and their interactions with 28S genes. The Pokey 
elements are divided into two divergent lineages each 
possessing a unique inverted terminal repeat (ITR) 
structure. Both lineages carry repeated copies of a segment 
from the intergenic spacer (IGS) region of the rDNA unit. 
In addition, two lineages of non-autonomous miniature 
inverted repeat transposable elements (MITEs) are present 
at the Pokey site in 28S genes, and elsewhere in the gen- 
ome. Finally, weak target sequence preferences for Pokey 
and the MITEs were found that are consistent with the site 
that is targeted in the 28S gene. We suggest that Pokey ele- 
ments have evolved specificity for their 28S gene insertion 
site and their presence at this site has played a key role in 
their long term survival in Daphnia by acting as a source 
for Pokey and their MITEs throughout the genome. 

Results 

rDNA sequence variation 

Assembly of a consensus rDNA repeating unit from the 
Daphnia genome revealed a gene organization typical of 
most eukaryotes (Figure 1A, Additional file 1). The IGS 
separating transcription units in Daphnia starts with an 
840 bp non-repetitive region, followed by a series of 323 
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Figure 1 The structure of rDNA and its transposons in the Daphnia genome sequence. (A) The rDNA is an array of transcription units 
repeated in tandem and separated by the intergenic spacer (IGS), which contains a repeated sequence (purple boxes), as well as non-repeated 
regions. Pink boxes in the IGS and internal transcribed spacer (ITS)2 indicate the position of sequences found in the 5' noncoding region (NCR) of 
Pokey elements. (B) The structure of Pokey and Po/cey-derived miniature inverted repeat transposable elements {mPok), which insert in a specific 
TTAA site in 28S rRNA genes. The approximate length of each region of a canonical autonomous Pokey element is given below the diagram. The 
black triangles at each end represent the inverted terminal repeats (ITRs). The dashed line in Pokey represents the highly length-variable region 
composed of both repetitive and unique sequences (more detail is provided in Figure 5). The green and orange regions correspond to regions 
that are similar in Pokey and mPok, the sequences of which are shown in Figure 2. (C) Sequence of the two types of ITRs. Complementary 
regions between the imperfect 5' and 3' ITRs are underlined. The TTAA target site duplication is shown in lower case. 
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bp repeats, and ends in a non-repetitive 3,115 bp region. 
The last region should include an external transcribed 
spacer, but the transcription start site is not known. Am- 
brose and Crease [12] have shown that the 323 bp re- 
peats are composed of two subrepeats of 200 bp and 123 
bp, which usually (but not always) alternate with each 
other. Most (58%) IGSs in the genome sequence contain 
three 123-bp and four 200-bp repeats with the remaining 
IGSs containing more copies of each repeat. 

Concerted evolution is expected to maintain very high 
sequence identity among all copies of the rDNA unit. 
The rDNA transcription units (external transcribed spa- 
cer through the 28S gene) in sequenced genomes of 
Drosophila [13] and Nasonia [14] contain from 3 to 18 
sites in which sequence variants are present in over 3% 
of units. In contrast, the Daphnia rDNA transcription 
unit and a 500 bp non-repetitive region from the IGS 
contain no sequence variants at the 3% threshold. This 
especially low level of rDNA variation is consistent with 
the very high level of homozygosity at allozyme and 
microsatellite markers observed in the sequenced Daph- 
nia isolate [15], the low level of sequence variation in 
28S genes from D. pulex in natural populations [16], and 
the high rate of recombination observed in the rDNA of 
a closely related species, Daphnia obtusa [17]. 

Pokey elements in 28S genes and the genome 

A consensus sequence for Pokey copies was assembled 
from the original sequence reads of the D. pulex genome 
(Additional file 1). We also identified 69 elements 
containing intact ITRs at both ends from the annotated 
genome scaffolds at wFleabase. We aligned these se- 
quences to two copies of Pokey elements from D. 
pulicaria rDNA, which were designated pcPokeyS (5 kb) 
and pcPokeyL (6.6 kb) [3]. As shown in Figure IB, the 
Pokey elements contain either 12 or 16 bp imperfect 
ITRs, a 5 ' non-coding region (NCR), an ORF encoding a 
putative transposase, and a 3 ' NCR. The D. pulex copies 
were up to 9,800 bp in length (Additional file 2) with the 
majority of the length variation occurring in the 5' NCR 
(discussed below). Excluding this repetitive region, the 
canonical Pokey element is approximately 4,500 bp. 

We also identified an additional 91 incomplete se- 
quences from 400 to 4,400 bp in length that lack either 
the 5 ' or 3 ' ITR, or both. The total number of Pokey ele- 
ments based on these genomic searches was 160, similar 
to the 175 estimated by comparing the depth of coverage 
of Pokey sequence reads to the average coverage of 
single copy genes [15]. We estimate that six of the 175 
copies are inserted into 28S genes and they all have the 
12-bp ITRs. 

A second type of sequence was also found at the Pokey 
insertion site in 28S genes and elsewhere in the genome. 
These elements were approximately 750 bp in length 



and contained sequences corresponding to the ends of 
the Pokey elements (Figure IB). These shorter elements 
could be divided into two groups that contain the same 
imperfect 12 or 16 bp ITRs found in full-length Pokey 
elements, and thus are designated as MITEs. Sequence 
identity between the Pokey and MITEs extends for 160 
bp at their 5' ends and 350 bp at their 3' ends (Figure 2). 
These regions contain repeat sequences that have been 
found in other piggy Bac elements [18]. The central 250 
bp region of the MITEs has no readily observed similar- 
ity to that of the Pokey elements. 

Like the Pokey elements, MITEs found outside 28S 
genes also target TTAA sites, suggesting that they use 
the transposase of Pokey elements. We hereafter refer to 
these MITEs as mPok. About 25 to 30 copies of these 
mPok were found in 28S genes, all with 12-bp ITRs. The 
total genome contains 90 to 110 copies with 60 mPok 
sequences in the assembled scaffolds (Additional file 2). 

Cluster analysis of Pokey and mPok elements 

A Neighbor-joining (NJ) tree was constructed from the 
consensus rDNA Pokey sequence, the pcPokeyS and L 
sequences from D. pulicaria and 29 Pokey elements 
from the assembled genome scaffolds of D. pulex that 
contained full-length transposase sequences and less 
than 5% ambiguous base-calls. The length-variable re- 
gion of the 5' NCR (Figure IB) was omitted from this 
analysis. The tree revealed two clusters with high boot- 
strap support, which will be referred to as the PokeyA 
and B families (Figure 3). The PokeyA cluster contains 
the two pcPokey elements described from D. pulicaria 
[3]. The PokeyB cluster contains a second paralogous 
lineage of Pokey elements previously identified by 
Penton and Crease [7] from D. obtusa (Additional file 
3). All PokeyA elements contain the 16-bp ITR1, while 
PokeyB elements have the 12-bp ITR2. Average sequence 
divergence between the 11 PokeyA elements is 5.9% 
while average divergence between the 18 PokeyB ele- 
ments is 5.0% (Table 1). Divergence between the two 
groups averages 39.9%. Based on the sequence of their 
ITRs, 11 (15.9%) of the 69 elements obtained from the 
annotated scaffolds are PokeyA while the remaining 58 
(84.1%) are PokeyB (Additional file 2). 

An NJ tree was also constructed with all 60 mPok se- 
quences identified in the assembled scaffolds and the 
consensus rDNA mPok sequence. Two clusters with 
high bootstrap support were again observed (Figure 4), 
one sharing the 16 bp ITR1 with PokeyA (designated 
mPokl) and the other sharing the 12 bp ITR2 with 
PokeyB (designated mPok2). mPok2 elements (46 copies) 
are over three times as numerous as mPokl elements 
(14 copies). Intragroup sequence divergence for mPokl 
is only 2.2%. In the case of mPok2, there is a large clus- 
ter of elements (mPo/:2a, Figure 4) with low average 
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5' End 

pokeyk ttaa CCCTT 
mPokl ttaa CCCTTTTTCGA 

Pokeyfe ttaa CCCTTTATCTA 
mPok2 ttaa CCCTTTATCTA 




CGACGGCTTT-CAGTAGGGTCTGGCGGGAGCTCACCCACAGACAC-GAGAGATTTCA-AATCAGC 
-GACGGCTTTTCAGTAGGGTG-GTCGGCGCAAATGTGCCCCCACACGAGAGATTTCA-AATCAGC 
TGCSGGCTTTYCAGTAGGGTGGGGCGGGCGCAAGCTCGCAGAAAC-GAGAAAATTAA-AAYCAGA 
-GACGGCTTTTCAGTAGGGTG-GCCGGCAGTAATGTGCCTCCACACGAGAGAAAAAAAAATCGGC 



Pokeyk TAAATTCATTCCTATCAAAGCTGATATCTCGAAATCCATCCTACAGAATTGAATCGCCTTT 
mPokl TAAATTCATTCTTTTCAAAGCTGATATCTCGAAATCTATCCTACAGAATTGAATCGCCTCT 
PokeyB CAAAGTGCCTCACACTTAMGCTGTTTTCTCAATGATCTGCCTACAAAATGGAATCGCCTTT 
mPok2 AAAAGCGATCGCTCCTTTAGCTGTTTTCTTAATGACCTACCTACAGAATTGAATCGCCATT 



3' End 

Pokeyk ATATCTCGAAATCCATCCTACAGAATTGAATGRCCTTGBAAAAACT CGAGCCGTCTGTCGGGACAAAAATTTGAACAAGTGGTAGTTGCA 

MPokl ATATCTCGAAATCTATCCTACAGAATTGAATCGCCTCTTCAAATATTTGTTGTTGACCAACCGACTGACGGGACAAATATTCACAAGAGTGGCTGCTGTG 
PokeyB TTTTCTCAATGATCTACCTACAAAATTGAATCGCCTTTTCAAGACTTTTATTAGTGCGAGCCGTCTGACGGGACAAATTTTTTTW-AAGTGGCTGCTGTG 
mPok2 TTTTCTTGATGACCTACCTACAGAATTGAATCGCCATTATAGATATTTGTTGGTGACCAACCGTCTGACGGGACAAATATTCRCAAGAGTGGCTGCTGTG 



Repeat 2 <- Repeat 1 

Pokeyk TGCAACTATCGGGACA-A ■■fchflhhMgl ACGKCTCGAGTCGAAAAAAGTGTCTCTYTTTAGAGACCATG-TCCGMCTGAAGGGACATTg 

m Pokl CGTAACTATCGGG ACACCTATGGCG KG ATTTTTCGCCATRAAAGT CCCGAC - - AG ACCTC AG - - CGGTG - - GTGATATTT - GGACTTT 

PokeyB TGTAACCGTCGGGACATT ACGGCAGTCGCTTGGAAATTTGGCCGTTTTT CGGTGAAGTGCCGGTCGGGACTTT 

mPok2 TGTAACTGACGGGACACTT ATTTCGGGACATAGGTGT CCCGGC - - AG ACCTC AG - - CGGTG - - GTGATATTT - GGACTTT 



Repeat 




CRAAACGCGGTWA GGCCGGAAAAAAATCGGATATTCCGAATTTTTTTTAAATGAGTGGTCTTAGGACCACTCAATGATATTT 

CAAAACGCGCGGTAAGGCCCGGAAAAAATCGGMTATTCCG-AATTTT 
AAAAGCAAAAAAA GT AC AGTTAAAAAAAG G AATTTT 



FTTTAAATGAGTGGTCTTAGGACCACTCAATGATATTT 



ttatttctttttaattgagtggtttgagacgtcctcagcaatctgt 
:aaaacgaccgaaaatgtacattaaaaacttgccgaaagg-ttttttttttttttgggtgaggtaattdcgtcacctggaatatg 



Pokeyk CGATCGAGCATTSAATCGAA-TCCGACCATCGGTTTTGT GGCGGTCGATAAAGGG ttaa 
mPokl CGATCGAGCATTGAATCGAAATCCGACCATCGGTTTTGT GGCGGTCGATAAAGGG ttaa 
PokeyB ACTTTTAA-TTTCAGTCAAGCA-CGGCCAAAAAAACTTT-GGC GTCGAAAAAGGG ttaa 
m Pokl TGCACTAACGGTCAGTTG -GCTCCGACAAAATTTTTCTT-CGC GTCGATAAAGGG t t aa 



Figure 2 Partial alignment of Pokey and mPok consensus sequences. Nucleotides conserved across all four sequences are marked with 
asterisks. The inverted terminal repeats are underlined. Repeat 1 and 2 refer to sequences shown in Additional file 6. B = C, G or T, M = A or C, 
R = A or G, W = A or T. mPok, Pokey-derived miniature inverted repeat transposable element. 



sequence divergence (3.2%) and a second group (mPo/c2b) 
with much higher divergence (20.4%, Table 1). Inspection 
of the mPok2b sequences reveals few intact ITRs and nu- 
merous insertions and deletions suggesting that they rep- 
resent older copies of mPok2 that are no longer able to 
transpose. Divergence between mPokl and mPo/c2a is 
24.9% (Table 1), somewhat lower than the divergence esti- 
mates between full length Pokey A and PokeyB elements. 

Characterization of the Pokey transposase 

The ORF from the pcPokeyL element was originally 
reported by Penton and colleagues [3] to be 1,461 bp en- 
coding a protein of 487 amino acids. However, this cod- 
ing region was suggested to contain a 68 bp intron (Y 
Bigot, personal communication), which when spliced 
from an RNA transcript would enable the production of 
a 668 amino acid protein. Pokey elements from D. pulex 
also appear to have this intron, which ranged in size 
from 68 to 74 bp in Pokey A and from 79 to 84 bp in 
PokeyB. Analysis of Pokey RNA transcripts by RT-PCR 
confirmed that the putative intron sequence can be 
spliced out [19]. 

PokeyA and PokeyB transposase genes encode con- 
served motifs shared among the transposase genes of di- 
verse piggyBac elements [20]. These include a DDD 
(aspartic acid) motif (amino acid residues 436, 544 and 
659) that is considered essential for transposase activity, 
an imperfect zinc finger motif that is believed to be ei- 
ther a chromatin-interacting Plant Homeo Domain or a 



protein-protein interaction domain, and a putative nu- 
clear localization signal. Keith and colleagues [20] identi- 
fied a fourth D residue C-terminal to the catalytic DDD 
triad. When they mutated this charged D to an un- 
charged N (asparagine) in a piggyBac construct, they ob- 
served a significant reduction in the transposition rate. 
This fourth residue is N instead of D in the D. pulicaria 
and D. pulex Pokey elements (Additional file 4). Partial 
sequences of Pokey transposase genes from other species 
in the subgenus Daphnia [7] all encode an N at this site. 

Of the 69 elements identified from the assembled scaf- 
folds, two PokeyA and two PokeyB elements were identified 
that may encode transposition-competent transposases 
(identified on the NJ tree in Figure 3). The ORF of these el- 
ements lacked premature stop codons and contained all 
features known or inferred to be important for the trans- 
position of piggyBac. 

Repeated sequences in Pokey and mPok 

Penton and colleagues [3] noted the presence of a 200- 
bp repeat sequence (A repeats) in the 5' NCR of D. 
pulicaria Pokey elements that was derived from the IGS 
region of the rDNA unit. We also observed A repeats in 
the Pokeys from D. pulex and note that they are usually 
preceded by a 48 bp sequence derived from ITS2 
(Figure 5; see Figure 1 for the location of these se- 
quences in the rDNA unit). The ITS2 repeat was termed 
C to differentiate it from an IGS-derived sequence previ- 
ously designated as B in the pcPokeyL element [3]. All 
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Figure 3 Unrooted Neighbor-joining tree of full-length Pokey elements from the Daphnia genome. The 27 elements containing putative 
transposase genes and less than 5% ambiguous bases, and the consensus rDNA element (Con-rPo/cey) are included in the tree. The elements 
form two well-supported clusters that correspond to the inverted terminal repeat (ITR)1 and ITR2 sequences (Figure 1C). All positions containing 
gaps and missing data were eliminated in pairwise comparisons. Bootstrap values greater than 70 are indicated at the nodes of the tree. 



but one Pokey element from the annotated scaffolds con- 
tain both A and C repeats with their copy number vary- 
ing between 2 and 5 per element. Due to the possibility 
of assembly errors in repeat regions, we cannot be cer- 
tain of the exact repeat configuration of each element. 
However, the evidence does suggest these regions are 
highly variable among elements. In addition, large tracts 

Table 1 Sequence divergence between Pokey elements 
from the Daphnia genome 

Lineage PokeyJK PokeyB mPo/cl mPo/c2a mPo/c2b 



Pokey/K 0.059 

PokeyB 

mPo/cl 

mPo/c2a 

mPo/c2b 



0.399 
0.050 



0.032 

0.249 0.022 
0.323 0.200 



0.204 



Estimates were calculated using the Kimura 2-parameter and pairwise deletion 
options in MEGA4. Estimates are based on separate alignments of PokeyA and 
PokeyB elements (28 sequences) and of rc\Pok (61 sequences). rc\Pok, Pokey- 
derived miniature inverted repeat transposable element. 



of additional sequences derived from areas of the Daph- 
nia genome outside the rDNA units were inserted be- 
tween the A repeats of several Pokey elements (Figure 5) 
suggesting that the 5' NCR frequently acquires non- 
element sequences from the genome. 

We aligned the A repeat region of three D. pulex and 
three D. pulicaria ribosomal IGS sequences [12] to A re- 
peats from all available Pokey elements (Additional file 
5) and generated an NJ tree (Figure 6). The IGS se- 
quences do not cluster separately from the Pokey re- 
peats, nor do repeats from PokeyA and PokeyB elements 
form separate clusters relative to one another. Mean se- 
quence divergence among the A repeats from all Pokey 
elements is only 5.3% (range 0 to 23.9%). In comparison, 
intraspecific sequence divergence in the region of the D. 
pulex and D. pulicaria IGS similar to A repeats is 1.8% 
[12]. This high sequence identity among the A repeats of 
the Pokey elements is in sharp contrast to the 
transposase sequences where mean nucleotide sequence 
divergence between the PokeyA and B families is nearly 
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Figure 4 Unrooted Neighbor-joining tree of 61 mPok sequences from the Daphnia genome. Con-rmPo/c is the consensus element in rDNA. 
The elements form two main clusters, denoted mPokl and mPokl that correspond to elements with inverted terminal repeat (ITR)1 and ITR2 
sequences, respectively (Figure 1C). All positions containing gaps and missing data were eliminated in pairwise comparisons. Bootstrap values 
greater than 70 are indicated at the nodes of the tree. mPok, Po/cey-derived miniature inverted repeat transposable element. 
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3' NCR 



CTTTAGTAGCTTCGACGCAAAAA 
CTTTAGTAGCTTCGACGCAAAAA 



Pokey CTTTGGTATCGACGGGGAGCTTTGGGAAATGGAACGAAGACGCAGAATGGGAAGTTGTCCGGTTCGGCGTTAGTTGTAGTTTCATTTC 

IGS CTTTGGTATCGACGGGGAGCTTTGGGAAATGRAACGAAGACGCAGAATGGGAAGTTGTCCGRTTCGGCGTTAGTTGTAGTTTCATTTC 

Pokey CAYCGTAGCTGAGCTCCTTTCCGCCCG 

IGS ATCCGTAGCAAAGCTYCTTTCCGCCCG 



C Repeat 



Pokey TASCKATCTGGGTGCCGCGCGACTTAAKTGTCCGCGATGCCTCAAATACAAATAAAAAAGGGTTC 
ITS2 TCTGGGTGCCGCGCGACTTAAGTGTCCGCGATGCCTCAAATACAAAAA 
* * 

Figure 5 The 5' length-variable region of Pokey elements from the Daphnia genome. Examples of the 5' length-variable region containing 
A and C repeats are drawn to scale above the canonical element, also drawn to scale. PokeyA element names are in grey, and Pokeyl element 
names are in black. The A repeats (approximately 200 bp) are similar to a sequence in the Dophnio intergenic spacer (IGS) while C repeats 
(approximately 50 bp) are similar to a sequence in the Dophnio internal transcribed spacer (ITS)2. Asterisks and bold type indicate variable 
positions in the aligned sequences. NCR, non-coding region. K = G or T, R = A or G, S = C or G, Y = C or T. 



40%. These findings suggest there have been repeated 
exchanges between the PokeyA and PokeyB elements 
and the IGS sequences of the rDNA units. 

In addition to A and C repeats, there are three other 
short repeated sequences in the 5 ' and 3 ' NCR of Pokey 
elements from the Daphnia genome that are shared with 
the mPok elements (Figure 2 and Additional file 6). 
Some of these repeats may correspond to repeat se- 
quences previously found in other piggyBac elements, 
such as the one diagrammed in Additional file 6 from 
Trichoplusia ni [18]. 

Target site preferences for Pokey and mPok 

Previous characterization of Pokey target sites found no 
preference aside from the requisite TTAA observed for 
all piggyBac elements [21-23]. However, in contrast to 
piggyBac elements, about 10% of the Pokey and mPok in- 
sertions, all oriented in the 5' to 3' direction, were 
found with target site duplications other than TTAA 
(Table 2). These other insertion sites were either TTAT 
or ATAA suggesting the only essential nucleotides are 
the middle T and A. Insertion of piggyBac elements into 
non-TTAA sites has also been observed in transposition 
assays in bat (7.2%, [24]) and human cell lines (2.4%, 
[25]). In both cases, the alternate sites contained the 
middle T and A. 

Unlike insertions outside the rDNA locus, all Pokey 
and mPok elements but one insert at a single site in 28S 
genes. The exception was an mPok sequence inserted 
into ITS2 near the sequence that gave rise to repeat C in 



Pokey elements. The specificity of Pokey elements for the 
28S gene site, despite the presence of over 30 TTAA 
sites in the rDNA unit, suggests that a larger recognition 
sequence could be involved in Pokey insertions. We 
therefore re-evaluated the flanking sequences of Pokey 
and mPok insertions outside of 28S genes. About 23% of 
mPok and 7% of Pokey copies are inserted into the 
TTAA flanking another Pokey or mPok insertion (that is, 
they are organized as tandem repeats), and were ex- 
cluded from the analysis. Visualization of preferred bases 
at specific sites revealed a weak preference for several 
bases immediately surrounding Pokey insertions (Figure 7). 
Significant sequence preferences included a C one base 
and a T four bases upstream (5') of the TTAA, and a total 
of eight preferred bases downstream (3') of the TTAA: a 
G at position 4, an A at position 7, the sequence AAATG 
at positions 11 to 15 and a T at position 18. Remarkably, 
each of these preferred bases match the Pokey target site 
in the 28S gene. 

Discussion 

Pokey diversity in the Daphnia genome 

Analysis of over 160 Pokey and Pokey -like sequences 
from the D. pulex genome revealed four well-supported 
clusters. Two clusters of larger elements with an average 
size of 5,100 bp were designated PokeyA and PokeyB. 
The clusters have diverged in sequence by about 40%, 
have different ITR structures and include members that 
possess an intact transposase ORF. The two other clus- 
ters are MITEs, designated mPokl and mPo/c2, because 
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Figure 6 Unrooted Neighbor-joining tree of A repeats from the 
intergenic spacer (IGS) and Pokey elements in D. pulex and D. 
pulicaria. The tree was generated from the alignment in Additional 
file 5. All positions containing alignment gaps and missing data 
were eliminated in pairwise sequence comparisons. 



Table 2 Target site duplications flanking Pokey elements 
in the Daphnia genome sequence 



each mPok element contains an ITR and other non- 
coding sequences corresponding to one of the full- 
length Pokey elements. Annotated PokeyB and mPok2 el- 
ements outnumber PokeyA and mPokl elements by over 
4:1. 

Available evidence suggests that both PokeyA and 
PokeyB occur in D. obtusa [7] and thus the two lineages 
have likely persisted across multiple speciation events. 



Target site duplications 

TTAA 
TTAT 
ATAA 
CTAA 



% of insertions 

88.13 
5.93 
5.09 
0.85 



Vertical diversification of TEs within the same genome 
can be driven by drift, selection, or more likely a com- 
bination of the two. Two models have been proposed. 
Lampe and colleagues [26] observed a loss of interaction 
between the ITRs and transposases of Tell mariner ele- 
ments from different subfamilies with sequence diver- 
gence greater than 16%. They postulated that silencing 
mechanisms based on sequence similarity might create 
intragenomic selection that favors divergence of the 
transposase and ITR sequences of related TEs to escape 
silencing. A second possibility is that the presence of nu- 
merous non-autonomous elements drives the divergence 
of transposase and ITR sequences because the non- 
autonomous copies titrate the transposase from autono- 
mous copies and decrease their fitness [27]. In that case, 
intragenomic selection might favor divergent elements 
whose transposases can only recognize their own ITRs. 

The ability of PokeyA and PokeyB elements to cross - 
mobilize could be investigated using yeast excision, yeast 
one-hybrid and/or electrophoretic mobility shift assays 
to determine the strength of interaction between the 
transposases and ITRs of each group. Although the dif- 
ferences in sequence between the two ITR structures ap- 
pear minor (Figure 1), Casteret and colleagues [28] 
demonstrated that a small number of single nucleotide 
changes to the ITR of the drosophilid DNA transposon 
Mosl produced significant changes in transposition rate. 

The mPok elements appear to be of an atypically large 
size (approximately 750 bp) compared to other MITEs, 
which can be as small as around 130 bp [29]. However, 
MITEs that are even larger than mPok have now 
been discovered in phylogenetically diverse eukaryotes 
(reviewed in [30]) suggesting that large MITEs are more 
common than once thought. One mechanism to explain 
the origin of large MITEs is progressive internal deletion 
of autonomous DNA TEs and subsequent selection for 
increasing transposition rate among the resultant ele- 
ments over time [31]. Thus, the larger size of mPok ele- 
ments could be a consequence of their recent evolution. 
While this could be true for the mPokl elements, which 
show little sequence diversity, the occurrence of highly 
divergent mPok2b copies is not consistent with a recent 
origin (Figure 4). Indeed, Depra and colleagues [32] sug- 
gested that the Mar MITEs in Drosophila willistoni, 
which are similar in size to the mPok elements, may 
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Figure 7 Sequences flanking the TTAA target site of Pokey and Po/cey-derived miniature inverted repeat transposable elements 
{mPok). WebLogo was used to analyze 26 bp upstream and 26 bp downstream of the TTAA site. The analysis is based on 130 unique sequences 
upstream and 162 sequences downstream of the target site from the Daphnia genome. The corresponding sequence in the Daphnia 28S gene is 
shown below the graph. Positions in the 28S gene that match preferred positions in genomic insertion sites are in bold type. 



have originated prior to the diversification of the 
willistoni subgroup 5.7 MYA, suggesting that large size 
does not necessarily indicate recent origin. 

Repeated sequences in Pokey 

An unusual aspect of the PokeyA and B lineages in 
Daphnia is the presence of sequences derived from 
NCRs of the rDNA unit. This includes an approximately 
200 bp sequence from a non-repetitive region of the IGS 
(A repeats) and an approximately 50 bp sequence from 
ITS2 (C repeats) (Figures 1 and 5). Pokey elements con- 
tain from 2 to 5 copies of these rDNA sequences within 
their 5' NCR (Figure 5). The highly recombinogenic na- 
ture of these repeats within the Pokey elements was first 
suggested by their differential spacing in pcPokeyS and 
pcPokeyL [3] and is strongly supported by this analysis 
in which particular combinations of A and C repeats are 
unique to only one or a few Pokey elements. 

The acquisition of DNA to the 5 ' NCR of Pokey does 
not appear to be limited to rDNA. For example, Pokey62 
contains a unique, approximately 3,600 bp sequence of 
which approximately 1,100 bp is derived from sequence 
on a non-rDNA scaffold in the Daphnia genome. Thus, 
Pokey elements often acquire sequences from their host s 
genome. Langer and colleagues [33] proposed that Ds el- 
ements could acquire host sequence if the transposase 
slides after binding but before cutting, or if cryptic ITR- 
like sequences exist downstream of an element. How- 
ever, the acquisition of sequences well within the 5' 
NCR of the Pokey elements argues against such a simple 
explanation (Figure 5). 

What is the significance of the A and C repeats? It is 
possible that they have no function and that their origin 
was chance recombination events that had no fitness im- 
pact on Pokey. However, the finding that all but one 
copy of Pokey from both lineages contain these repeats 
suggests that they do play some role in Pokey activity. 
Possible functions of these sequences include transcrip- 
tion enhancers, transcription terminators to prevent the 



formation of aberrant rRNA read-through transcripts, or 
binding sequences to recruit epigenetic modifiers 
[34-36] . We suggest a transcription role for these repeats 
to be most likely as mPok elements, which do not need 
to be transcribed to be mobilized by a Pokey transposase, 
do not have the rDNA repeats. 

The most remarkable property of the A repeats is that 
the same sequence was retained in both the PokeyA and 
B lineages. Not only do the A repeats correspond to the 
highest level of sequence conservation between the two 
lineages, but the A repeats within the two Pokey lineages 
are as well conserved as IGS sequences undergoing con- 
certed evolution within the rDNA unit (Figure 6). This 
high level of sequence identity suggests that recombin- 
ation between the Pokey repeats and the rDNA repeats 
occurs on a regular basis, thus strengthening the argu- 
ment that Pokey elements have become highly special- 
ized for their insertion into the rDNA locus. 

Target site selection and the rDNA niche 

While it is not possible to assemble the sequences of in- 
dividual Pokey elements inserted in rDNA, it should be 
noted that the consensus Pokey sequence from rDNA is 
similar to an assembled non-rDNA copy that could pu- 
tatively encode a functional transposase (Figure 3). 
Given the rapid turnover of rDNA units, Pokey elements 
within the locus should be among the newest insertions, 
while those outside of rDNA are a combination of new 
and old insertions. 

Pokey is the only DNA-mediated TE that is known to 
evolve insertion specificity for the rDNA unit. Remark- 
ably, the Pokey insertion site is in the same region of the 
28S gene that is also the target site for a number of non- 
LTR retrotransposons (reviewed in [2]). Two of these el- 
ements, R2 and R5, which insert within a few base pairs 
of the Pokey site, encode related endonucleases that have 
an active site similar to class IIS restriction enzymes 
[37,38]. The R2 endonuclease has been shown to have 
exceptional specificity for the 30 to 40 nucleotides 
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surrounding its insertion site [39,40]. Two other non- 
LTR retrotransposons, Rl and R4, insert 75 and 28 bp, 
respectively, downstream of the Pokey site. These ele- 
ments encode an endonuclease with similarity to the 
apurinic endonuclease involved in DNA repair [41]. The 
endonuclease encoded by Rl has also been shown to 
have sequence specificity for the insertion site [41,42]. In 
all four cases, most copies of the element are inserted in 
rDNA with most copies outside rDNA inserted into sites 
with sequence similarity to the 28S gene target site 
[13,43]. 

The transposase of Pokey elements represents a third 
protein that has evolved specificity for this region of 
the 28S gene. Some of the best-studied examples of 
integrases that have evolved insertion specificity involve 
the LTR retrotransposons of yeast [44-47]. In these 
cases, the integrases have evolved protein-protein speci- 
ficity for association with specific transcription factors 
or chromatin structural components rather than actual 
DNA sequence specificity. Such protein-chromatin inter- 
actions could also be involved in the insertion specificity 
of Pokey elements, but we are not aware of any specific 
chromatin components that are bound to the central re- 
gion of 28S genes. Alternatively, the A repeat associated 
with Pokey elements may contain a recognition site for a 
nucleolar protein that helps guide into the nucleolus 
Pokey elements that have been excised and are ready for 
insertion. 

It seems a remarkable coincidence that three different 
lineages of TEs have evolved specificity for the same 
small region of the rDNA unit. The 28S target region is 
highly conserved, but there are many regions of the 18S 
and 28S genes that are conserved across eukaryotes. We 
suggest either the DNA in this region is highly exposed 
and thus accessible to the TE machinery, a yet unknown 
chromatin component can be utilized by the TE in its 
evolution of specificity, or this is one of only a few areas 
of the rDNA where a TE can insert without being 
quickly eliminated by recombination or selected against 
by the synthesis of disrupted rRNA. 

Based on the concordance between phylogenies of 
rDNA Pokey elements and their hosts, Penton and 
Crease [7] concluded that Pokey has undergone stable, 
vertical inheritance in the rDNA of species in the sub- 
genus Daphnia since its origin. Thus, unlike most Class 
II TEs, Pokey elements appear to have evaded complete 
silencing by the host for millions of years. The unique 
breeding system of Daphnia, involving extended periods 
of apomictic reproduction, and the complete loss of 
sexuality in some lineages may have created strong se- 
lection pressure on ancestral Pokey elements to avoid 
causing deleterious mutations in their host, while still 
maintaining a transposition rate high enough to survive. 
The theory describing the interaction between TEs and 



asexual or partially asexual hosts predicts three possible 
outcomes: (1) active elements are lost, (2) the host goes 
extinct due to TE-induced mutation, or (3) the elements 
become domesticated and the threat is neutralized [48]. 
However, Pokeys invasion of rDNA suggests a fourth 
outcome, the long-term persistence of active elements. 

Zhou and colleagues [5] have argued that rDNA is an 
ideal TE niche, because it is difficult for the host to com- 
pletely silence elements that have inserted into genes 
that must be expressed. In addition, TEs inserted in the 
locus are continually removed by recombination events 
so old copies that could interfere with the elements are 
eliminated. Finally, each insertion has a predictable, 
small effect on the fitness of the host. This effect is small 
because all organisms contain more than enough rDNA 
for the production of rRNA, and those rDNA units with 
insertions are usually not transcribed [49]. R2 and Rl el- 
ements, which are abundant in the rDNA of arthropods 
including crustaceans, have not been found in Daphnia. 
Perhaps Pokey elements are even better adapted for this 
niche in that they can be lost from the rDNA locus, but 
copies located outside the rDNA can on occasion be ac- 
tive and re-establish insertions in the locus. Indeed, indi- 
vidual D. pulex that lack PokeyA in rDNA have been 
observed, but no individuals have been observed that 
completely lack Pokey elements [11,19,50-52], 

Conclusions 

In spite of what would appear to be a seemingly inhos- 
pitable location for a DNA transposon, Pokey has 
evolved specificity for a site in the 28S genes of Daph- 
nia. Analysis of both the annotated D. pulex genome 
and the raw trace files revealed that rDNA units display 
extremely low levels of sequence variation consistent 
with the high rates of recombination previously observed 
for this locus. Indeed, Pokey has diversified into two line- 
ages of autonomous elements, PokeyA and PokeyB, 
which appear to have persisted across multiple speci- 
ation events. While members of the B lineage are located 
in the rDNA of the population in Oregon that was se- 
lected for genomic sequencing [15], members of the A 
lineage are in the rDNA of D. pulicaria and D. pulex 
populations outside Oregon [3,7,52]. Both Pokey lineages 
have given rise to two parallel lineages of MITES, mPokl 
and mPok% which appear to be deletion derivatives of 
the full-length elements. 

Part of the specificity of Pokey elements can be attrib- 
uted to the sequence specify of the transposase itself, as 
the target site of non-rDNA copies bears weak sequence 
similarity to the 28S rRNA insertion site. However, both 
Pokey lineages possess repeat sequences derived from 
rDNA that vary in arrangement and copy number. These 
repeats may play a role in the expression of Pokey ele- 
ments from the rDNA locus, and/or a role in insertion 
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specificity. Whatever their function, the Pokey repeats 
are evolving in concert with each other and with the 
rDNA unit itself suggesting ongoing sequence exchange. 
It remains unknown whether Pokey elements in or out 
of the rDNA locus are most active, and what fraction of 
new insertions occur in rDNA. While more insertions 
are found outside rDNA, this could simply reflect the 
fact that non-rDNA insertions are more stable over time. 
Overall, our results suggest a complex interaction be- 
tween Pokey and its host, and highlight the need to con- 
centrate not only on host traits but also on traits of 
individual families when trying to understand the 
current dynamics and past evolutionary history of TEs. 

Methods 

Search for and assembly of rDNA and Pokey elements 

The original sequencing reads of the genome sequencing 
project from the cladoceran crustacean Daphnia pulex 
were accessed by basic local alignment search tool 
(BLAST) [53,54] in the Trace Archives at GenBank [55]. 
In addition, BLAST searches were conducted of the as- 
sembled scaffolds at wFleaBase [56]. 

The search for Pokey elements in 28S genes was 
conducted in the same manner as searches for other 
28S-specific TEs in rDNA [57]. Briefly, a BLAST search 
was conducted using the downstream region flanking 
the Pokey insertion site as the query. Reads identified in 
this search were examined upstream of the query region 
for sequences that were not 28S and thus putative TEs. 
Once the consensus of the TE end was acquired, itera- 
tive BLAST searches were conducted using the end of 
each newly acquired TE extension until the 5' junction 
of the element with the 28S gene was reached. In order 
to identify copies present outside 28S genes, the ends of 
the TE consensus sequences were used as BLAST quer- 
ies and the flanking sequences examined. Sequences of 
the putative transposase gene were analyzed using the 
PSORTII server [58] to identify features of the amino 
acid sequence. 

Cluster analysis of Pokey elements 

Pokey elements were aligned using a combination of the 
CLUSTAL, MUSCLE and MAFFT multiple sequence 
alignment programs available from the EMBL-EBI web- 
site [59]. Alignments were manually adjusted in the 
program BioEdit [60]. Only sequences with less than 
5% ambiguous bases across the aligned region and 
containing an ITR at both ends were used in cluster ana- 
lyses. Measurements of pairwise sequence divergence 
were calculated using the Kimura 2-parameter method 
[61] in MEGA4 [62]. NJ trees [63] were also constructed 
in MEGA4. Bootstrap analysis was performed on 1000 
pseudo-replicates for each tree [64]. The alignment of 
full-length elements excluded the variable repeat region 



between the 5' ITR and the transposase gene. In 
addition, a dataset including the last approximately 
1,600 bp of the 3' end of rDNA Pokey elements from 
species in the subgenus Daphnia [7] was aligned with 
the Pokey elements from the Daphnia genome sequence 
and used to generate an NJ tree. 

Sequence variation in rDNA and Pokey elements 

Sequence variation present in the rDNA transcription 
units and in a 500 bp region of the IGS was evaluated in 
the same manner as described by Stage and Eickbush 
[13]. Briefly, 525 bp overlapping regions of each consen- 
sus were used as BLAST queries in the trace archives. 
Approximately 250 reads were collected from each 
BLAST search and evaluated for sequence changes present 
in at least eight sequence reads. In order to screen out se- 
quencing errors, sites containing sequence differences 
were further evaluated using the trace quality scores avail- 
able through the trace archives at GenBank [55]. 

A total of 26 base pairs on each side of the Pokey inser- 
tion site of both Pokey and mPok elements, all oriented in 
the 5' to 3' direction, were compared to determine if a 
preferred base is present at each position. A graphical rep- 
resentation of sequence conservation was made using 
WebLogo [65]. Only the 4 bp upstream and 15 bp down- 
stream of the insertion contain preferred bases. 

Analysis of repeat sequences in Pokey 

Identification of repeat sequences within Pokey, and com- 
parisons between Pokey and rDNA were performed using 
Pustell DNA matrix in MacVector 10.0 (MacVector Inc., 
Cary, NC, USA). Default parameters were used with 80% 
sequence identity in a 16 bp window. 

Additional files 



Additional file 1: Consensus sequences of the rDNA unit, Pokey and 
mPok from the Daphnia genome sequence. The sequences are 
provided in Fasta format. The highly length-variable region at the 5' end 
of Pokey elements has been omitted and is indicated by several Xs. 

Additional file 2: List of Pokey elements extracted from the 
annotated scaffolds of the Daphnia genome sequence. The scaffold 
number (S), first nucleotide position (nt), length in bp (length) and 
lineage {Pokeyk or B, mPokl or 2) is provided for each sequence. NJ, 
Neighbor-joining tree. 

Additional file 3: Unrooted Neighbor-joining tree of 1600 bp 
sequences from the 3' end of Pokey elements. Elements from the 
Daphnia genome sequence and cloned from the rDNA of other species 
in the subgenus Daphnia [7] are included. The latter are preceded by PC 
All positions containing alignment gaps and missing data were 
eliminated in pairwise sequence comparisons. Bootstrap values greater 
than 70 are shown at the nodes in the tree. 

Additional file 4: Partial alignment of transposase amino acid 
sequences from Pokey and p/ggyftac-superfamily elements. The 

three conserved catalytic aspartic acid (D) residues, the four cysteine (C) 
residues thought to compose the zinc-finger/Plant Homeo Domain (PHD) 
motif and the putative nuclear localization signal (NLS) are highlighted. 
The asparagine (N) residue conserved in Pokey transposases is 
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highlighted in grey. Other piggyBac elements have D at this position. 
pB-Bmor, putative Bombyx mori piggyBac transposase; pB-Harm, piggyBac 
transposase from Helicoverpa armigera; pB-Xtro, piggyBac transposase 
from Xenopus tropicalis; pB-like-Hsap, piggyBac transposase-derived 
protein from Homo sapiens. 

Additional file 5: Alignment of A repeats from the IGS and Pokey 
elements in Daphnia pulex and Daphnia pulicaria. The sequences of 
76 Pokey A repeats and the corresponding sequence from three 
ribosomal IGS from each of D. pulex and D. pulicaria [12] are provided in 
Fasta format. The order of the repeat within an element is given after the 
element name (for example, Pokey] 1-3 is copy 3 in element 11). The 
number of A repeats ranges from 2 to 5 per element. 

Additional file 6: Repeated sequences in Pokey with similarity to 
piggyBac elements. The approximate location of repeat sequences in 
piggyBac that lack primary sequence identity with those in Pokey, but 
occur in similar locations, are indicated for both elements. The dashed 
line in Pokey presents the repetitive region described in Figure 5. The 
repetitive region, 5' NCR and transposase genes are not drawn to scale. 
NCR, non-coding region; tpase, transposase gene. 
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